Model Selection

Multi-frame Processing

# Multi-frame Processing

Tinyllava Video Qwen2.5 3B Group 16 512

TinyLLaVA-Video is a video understanding model based on Qwen2.5-3B and siglip-so400m-patch14-384, utilizing a grouped resampler for video frame processing

Xgen Mm Vid Phi3 Mini R V1.5 128tokens 8frames

xGen-MM-Vid (BLIP-3-Video) is an efficient compact vision-language model equipped with an explicit temporal encoder, specifically designed for video content understanding.

Safetensors English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase